面向目标的生成脚本学习旨在根据目标生成后续步骤,这是帮助机器人进行日常生活的刻板印象活动的重要任务。我们表明,如果历史状态不仅被给人的语言指示捕获,而且还可以增强随附图像提供的其他信息,可以提高此任务的性能。因此,我们提出了一项新任务,多媒体生成脚本学习,以通过跟踪文本和视觉方式中的历史状态,并介绍包含2,338个任务和31,496个步骤的第一个基准,从而生成后续步骤。我们旨在生成视觉状态的脚本,这些脚本是可跟踪的,对看不见的任务的诱导性,并且在各自的步骤中多样化。我们建议通过多媒体选择性编码器编码视觉状态更改,并使用检索仪的解码器从先前观察到的任务中转移知识,并通过优化面向多样性的对比度学习目标来在每个步骤中介绍不同的信息。我们定义指标以评估发电质量和电感质量。实验结果表明,我们的方法明显优于强质基线。
translated by 谷歌翻译
对事件序列的预测对于信息检索和自然语言处理中的许多现实世界应用至关重要。在事件序列预测中,未来的活动生成(FEG)是一项具有挑战性的任务,因为它不仅需要流利的文本生成,而且需要常识性推理才能保持整个事件故事的逻辑连贯性。在本文中,我们提出了一个新颖的可解释的FEG框架COEP。它突出并整合了两种类型的事件知识,对直接事件事件关系的顺序知识以及推论知识,这些知识反映了事件之间的中间角色心理学(例如意图,原因,反应),这些心理本质地将故事推向了故事。为了减轻知识遗忘问题,我们为每种类型的知识设计了两个模块,即IM和GM,它们是通过及时调整组合的。首先,IM专注于理解推论知识,以产生常识性解释并为通用汽车提供软提示向量。我们还设计了一种对比歧视器,以提高概括能力。其次,GM通过用IM的指导对直接顺序知识进行建模来生成未来事件。自动和人类评估表明,我们的方法可以产生更连贯,具体和逻辑的未来事件。
translated by 谷歌翻译
经常性神经网络(RNN)是深度学习的基本结构。最近,一些作品研究了过度参数化神经网络的培训过程,并显示过度参数化网络可以在一些显着的概念类别中学习功能,其中包含可提供的概括误差。在本文中,我们分析了随机初始化的RNN的培训和泛化,并提供了对近期工作的以下改进:1)对于输入序列的RNN $ x =(x_1,x_2,...,x_l)$,以前作品学习,学习函数,这些功能是$ f(\ beta ^ t_lx_l_l_l)$的函数,并且需要$ || x_l || \ leq \ epsilon $的归一化条件,具体取决于$ f $的复杂性。在本文中,使用关于神经切线内核矩阵的详细分析,我们证明了概括的概括误差,而无需规范化条件,并且显示一些值得注意的概念类是以迭代的数量学习,并在输入中缩放几乎 - 多项式的样本长度$ l $。 2)此外,我们证明了一种新的结果来学习输入序列的N变量功能,具有FOR $ f(\ beta ^ t [x_ {l_1},...,x_ {l_n})$,它不属于到“添加剂”概念类,我,e。,函数的总和$ f(x_l)$。我们展示了当$ n $或$ l_0 = \ max(l_1,..,l_n) - \ min(l_1,l_n)$小,$ f(\ beta ^ t [x_ {l_1} ,...,x_ {l_n}])$将以数字迭代和样本在输入长度$ l $上的数量迭代和样本缩放。
translated by 谷歌翻译
The development of deep learning models in medical image analysis is majorly limited by the lack of large-sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most of the current unsupervised learning methods need to be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, which is a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images and without using any pre-trained models, Swin MAE is still able to learn useful semantic features purely from images. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in terms of the transfer learning results of downstream tasks. The code will be publicly available soon.
translated by 谷歌翻译
Instruction tuning, a new learning paradigm that fine-tunes pre-trained language models on tasks specified through instructions, has shown promising zero-shot performance on various natural language processing tasks. However, it's still not explored for vision and multimodal tasks. In this work, we introduce MultiInstruct, the first multimodal instruction tuning benchmark dataset that consists of 47 diverse multimodal tasks covering 11 broad categories. Each task is designed at least with 5,000 instances (input-out pairs) from existing open-source datasets and 5 expert-written instructions. We take OFA as the base pre-trained model for multimodal instruction tuning, and to improve its performance, we explore multiple transfer learning strategies to leverage the large-scale Natural Instructions dataset. Experimental results demonstrate its strong zero-shot performance on various unseen multimodal tasks and the benefit of transfer learning from text-only instructions. We also design a new evaluation metric: Sensitivity, to evaluate how sensitive the model is to the variety of instructions. Our results indicate that the model is less sensitive to the varying instructions after finetuning on a diverse set of tasks and instructions for each task.
translated by 谷歌翻译
Dense retrievers have made significant strides in obtaining state-of-the-art results on text retrieval and open-domain question answering (ODQA). Yet most of these achievements were made possible with the help of large annotated datasets, unsupervised learning for dense retrieval models remains an open problem. In this work, we explore two categories of methods for creating pseudo query-document pairs, named query extraction (QExt) and transferred query generation (TQGen), to augment the retriever training in an annotation-free and scalable manner. Specifically, QExt extracts pseudo queries by document structures or selecting salient random spans, and TQGen utilizes generation models trained for other NLP tasks (e.g., summarization) to produce pseudo queries. Extensive experiments show that dense retrievers trained with individual augmentation methods can perform comparably well with multiple strong baselines, and combining them leads to further improvements, achieving state-of-the-art performance of unsupervised dense retrieval on both BEIR and ODQA datasets.
translated by 谷歌翻译
Pre-trained multilingual language models show significant performance gains for zero-shot cross-lingual model transfer on a wide range of natural language understanding (NLU) tasks. Previously, for zero-shot cross-lingual evaluation, pre-trained models are only fine-tuned on English data and tested on a variety of target languages. In this paper, we do cross-lingual evaluation on various NLU tasks (sentence classification, sequence labeling, question answering) using prompt-tuning and compare it with fine-tuning. The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets, with only 0.1% to 0.3% tuned parameters. Additionally, we demonstrate through the analysis that prompt tuning can have better cross-lingual transferability of representations on downstream tasks with better aligned decision boundaries.
translated by 谷歌翻译
与自然图像相比,医学图像很难获取,标签成本很高。作为一种无监督的学习方法,对比学习可以更有效地利用未标记的医学图像。在本文中,我们使用了一种基于变压器的对比学习方法,并通过转移学习创新了对比度学习网络。然后,将输出模型转移到下游腮腺分割任务,该任务改善了测试集上腮腺分割模型的性能。改善的DSC为89.60%,MPA为99.36%,MIOU为85.11%,HD为2.98。与使用监督学习模型作为腮腺分割网络的预训练模型的结果相比,所有四个指标均显示出显着改善。此外,我们发现,通过对比度学习模型对细分网络的改进主要在编码器部分中,因此本文还试图为解码器部分构建对比度学习网络,并讨论了在构建过程中遇到的问题。
translated by 谷歌翻译
腮腺肿瘤约占头颈肿瘤的2%至10%。术前肿瘤定位,鉴别诊断以及随后选择适当的腮腺肿瘤治疗方法。然而,这些肿瘤的相对稀有性和高度分散的组织类型使基于术前放射线学对这种肿瘤病变的细微差异诊断造成了未满足的需求。最近,深度学习方法发展迅速,尤其是变形金刚在计算机视觉中击败了传统的卷积神经网络。为计算机视觉任务提出了许多新的基于变压器的网络。在这项研究中,收集了多中心多模束MRI图像。使用了基于变压器的SWIN-UNET。将搅拌,T1和T2模态的MRI图像合并为三通道数据以训练网络。我们实现了对腮腺和肿瘤感兴趣区域的分割。测试集上的模型DSC为88.63%,MPA为99.31%,MIOU为83.99%,HD为3.04。然后在本文中设计了一系列比较实验,以进一步验证算法的分割性能。
translated by 谷歌翻译
终身事件检测旨在逐步更新具有新事件类型和数据的模型,同时保留先前学习的旧类型的功能。一个关键的挑战是,当不断接受新数据训练时,该模型会灾难性地忘记旧类型。在本文中,我们介绍了情节记忆提示(EMP),以明确保留特定于任务的知识。我们的方法采用每个任务的连续提示,并进行了优化以指导模型预测并学习特定于事件的表示。在以前的任务中学习的EMP与后续任务中的模型一起携带,并且可以用作存储模块,以保持旧知识并转移到新任务。实验结果证明了我们方法的有效性。此外,我们还对终身学习中的新事件类型进行了全面分析。
translated by 谷歌翻译